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Recognition  by  Linear  Combinations  of 

Models 


1  Modeling  Objects  by  the  Linear  Combination  of 
Images 

1.1  Recognition  by  Alignment 

Visual  object  recognition  requires  the  matching  of  an  image  with  a  set  of  models  stored 
in  memory.  Let  M  =  {Mx, Mn}  be  the  set  of  stored  models,  and  P  be  the  image 
to  be  recognized.  In  general,  the  viewed  object,  depicted  by  P,  may  differ  from  all  the 
previously  seen  images  of  the  same  object.  It  may  be,  for  instance,  the  image  of  a  three- 
dimensional  object  seen  from  a  novel  viewing  position.  To  compensate  for  these  varia¬ 
tions,  we  may  allow  the  models  (or  the  viewed  object)  to  undergo  certain  compensating 
transformations  during  the  matching  stage.  If  T  is  the  set  of  allowable  transformations, 
the  matching  stage  requires  the  selection  of  a  model  M,-  €  M.  and  a  transformation 
T  €  T,  such  that  the  viewed  object  P  and  the  transformed  model  T Mi  will  be  a 5  close 
as  possible.  The  general  scheme  is  called  the  alignment  approach,  since  an  alignment 
transformation  is  applied  to  the  model  (or  to  the  viewed  object)  prior  to,  or  during 
the  matching  stage.  Such  an  approach  is  used  in  [Chien  &  Aggarwal  1987,  Faugeras  & 
Hebert  1986,  Fishier  &  Bolles  1981,  Huttenlocher  &  Ullman  1987,  Lowe  1985,  Thompson 
&;  Mundy  1987,  Ullman  1986].  Key  problems  that  arise  in  any  alignment  scheme  are  how 
to  represent  the  set  of  different  models  Af,  what  is  the  set  of  allowable  transformations 
T,  and,  for  a  given  model  M,  €  M,  how  to  determinethe  transformation  T  £  T  so  as  to 
minimize  the  difference  between  P  and  TM,.  For  example,  in  the  scheme  proposed  by 
Basri  and  Ullman  [1988]  a  model  is  represented  by  a  set  of  2-D  contours,  with  associated 
depth  and  curvature  values  at  each  contour  point.  The  set  of  allowed  transformations  in¬ 
cludes  3-D  rotation,  translation  and  scaling,  followed  by  an  orthographic  projection.  The 
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transformation  is  determined  as  in  [Huttenlocher  &  Ullman  1987,  Ullman  1986,  1989]  by 
identifying  at  least  three  corresponding  features  (points  or  lines)  in  the  image  and  the 
object. 


In  this  paper  we  suggest  a  different  approach,  in  which  each  model  is  represented 
by  the  linear  combination  of  2*D  images  of  the  object.  The  new  approach  has  several 
advantages.  First,  it  handles  all  the  rigid  3-D  transformations,  but  it  is  not  restricted 
to  such  transformations.  Second,  there  is  no  need  in  this  scheme  to  explicitly  recover 
and  represent  the  3-D  structure  of  objects.  Third,  the  computations  involved  are  often 
simpler  than  in  previous  schemes. 

The  paper  is  divided  into  two  parts.  In  the  first  (section  1)  we  show  that  the  variety 
of  views  depicting  the  same  object  under  different  transformations  can  often  be  expressed 
as  the  linear  combinations  of  a  small  number  of  views.  In  the  second  part  (section  2)  we 
suggest  how  this  linear  combination  property  may  be  used  in  the  recognition  process. 


1.2  Using  Linear  Combinations  of  Images  to  Model  Objects 
and  Their  Transformations 

The  modeling  of  objects  using  linear  combinations  of  images  is  based  on  the  following 
observation.  For  many  continuous  transformations  of  interest  in  recognition,  such  as 
3-D  rotation,  translation  and  scaling,  all  the  possible  views  of  the  transforming  object 
can  be  expressed  simply  as  the  linear  combination  of  other  views  of  the  same  object. 
The  coefficients  of  these  linear  combinations  often  follow  in  addition  certain  functional 
restrictions.  In  the  next  two  sections  we  show  that  the  set  of  possible  images  of  an  object 
undergoing  rigid  3-D  transformations  and  scaling  is  embedded  in  a  linear  space,  spanned 
by  a  small  number  of  2-D  images. 


The  images  we  will  consider  are  2-D  edge  maps  produced  in  the  image  by  the  (ortho¬ 
graphic)  projection  of  the  bounding  contours  and  other  visible  contours  on  3-D  objects. 
We  will  make  use  of  the  following  definitions.  Given  an  object  and  a  viewing  direction, 
the  rim  is  the  set  of  all  the  points  on  the  object’s  surface,  whose  normal  is  perpendicular 
to  the  viewing  direction  [Koenderink  &  Van  Doom  1979].  This  set  is  also  called  the 
contour  generator  [Marr  1977].  A  silhouette  is  an  image  generated  by  the  orthographic 
projection  of  the  rim.  In  the  analysis  below  we  assume  that  every  point  along  the  silhou¬ 
ette  is  generated  by  a  single  rim  point.  An  edge  map  of  an  object  usually  contains  the 
silhouette,  which  is  generated  by  its  rim. 


We  will  examine  below  two  cases.  The  case  of  objects  with  sharp  edges,  and  the  case 
of  objects  with  smooth  boundary  contours.  The  difference  between  these  two  cases  is 
illustrated  in  Figure  1.  For  an  object  with  sharp  edges,  such  as  the  cube  in  Fig.  1  (a  & 
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b),  the  rim  is  stable  on  the  object  as  long  as  the  edge  is  visible.  In  contrast,  a  rim  that 
is  generated  by  smooth  bounding  surfaces,  such  as  in  the  ellipsoid  in  Fig.  1  (c  &  d),  is 
not  fixed  on  the  object,  but  changes  continuously  with  the  viewpoint. 


1.3  Objects  with  Sharp  Edges 

In  the  discussion  below  we  examine  the  case  of  objects  with  sharp  edges  undergoing 
different  transformations  followed  by  an  orthographic  projection.  In  each  case  we  show 
how  the  image  of  an  object  obtained  by  the  transformation  in  question  can  be  expressed  as 
the  linear  combination  of  a  small  number  of  pictures.  The  coefficients  of  this  combination 
may  be  different  for  the  x-  and  y-coordinates.  That  is,  the  intermediate  view  of  the  object 
may  be  given  by  two  linear  combinations,  one  for  the  x-coordinates  and  the  other  for  the 
y-coordinates.  In  addition,  certain  functional  restrictions  may  hold  among  the  different 
coefficients. 

To  introduce  the  scheme  we  first  apply  it  to  the  restricted  case  of  rotation  about  the 
vertical  axis,  then  examine  more  general  transformations. 


1.3.1  3-D  Rotation  Around  the  Vertical  Axis 


Let  P\  and  P2  be  two  images  of  an  object  0  rotating  in  depth  around  the  vertical  axis 
(F-axis).  P2  is  obtained  from  Pj  following  a  rotation  by  an  angle  a,  (a  ^  kir).  Let  P  be 
a  third  image  of  the  same  object  obtained  from  P\  by  a  rotation  of  an  angle  9  around  the 
vertical  axis.  The  projections  of  a  point  p  =  ( x,y,z )  €  O  in  the  three  images  are  given 
by: 

Pi  =  (*i,yi)  =  (*,!/)  €  Pi 

P2  -  (x2,y2)  =  (xcos  Q  +  2  sin  a,  y)  €  P2 

p  =  (x,y)  =  (x  cos  9  4-  z  sin  9,  y)  €  P 


Claim:  Two  scalars  a  and  6  exist,  such  that  for  every  point  p  €  0: 


x  =  axi  +  bx  2 

with: 

a2  +  b2  +  2 ab  cos  a  =  1 

Proof:  The  scalars  a  and  b  are  given  explicitly  by: 

sin(a  —  9) 

a  =  - : - 

sm  a 

sin  9 


Aooesslon  For 

NTIS  GRA4I  jg* 

DTIC  TAB  Q 

Unannounced  0 

Justification _ . 

By - - - 

Distribution/ 

Availability  Codas 
Uvaii  and/or 
Dlst  Special 


V  V 

(c)  (d) 


Figure  1:'  Changes  in  the  rim  during  rotation,  (a)  A  bird’s  eye  view  of  a  cube,  (b)  The  cube 
after  rotation.  In  both  (a)  and  (b)  points  p,  q  lie  on  the  rim.  (c)  A  bird’s  eye  view  of  an 
ellipsoid,  (d)  The  ellipsoid  after  rotation.  The  rim  points  p,  q  in  (c)  are  replaced  by  p',  q' 
in  (d).  (e)  An  ellipsoid  in  a  frontal  view,  (f)  The  rotated  ellipsoid  (outer),  superimposed  on 
the  appearance  of  the  rim,  as  a  planar  space  curve  after  rotation  by  the  same  amount  (inner) 
(From  [Basri  &  Ullman,  1988]). 


4 


.  J  t 


Then: 


ax i  + bx 2 


sin  (a  —  9)  sin  9.  .  .  . 

- : - x  +  — - (x  cos  a  +  z  sin  a)  =  x  cos  9  +  z  sin  9  =  x 

sin  a  sin  a 


Therefore,  an  image  of  an  object  rotating  around  the  vertical  axis  is  always  a  linear 
combination  of  two  model  images.  It  is  straightforward  to  verify  that  the  coefficients  a 
and  b  satisfy  the  above  constraint.  It  is  worth  noting  that  the  new  view  P  is  not  restricted 
to  be  an  intermediate  view  (that  is,  the  rotation  angle  9  may  be  larger  than  a).  Finally, 
it  should  be  noted  that  we  do  not  deal  at  this  stage  with  occlusion,  we  assume  here  that 
the  same  set  of  points  is  visible  in  the  different  views. 


1.3.2  Linear  Transformations  in  3-D  Space 


Let  0  be  a  set  of  object  points.  Let  Pi,  Pi  and  P3  be  three  images  of  O,  obtained  by 
applying  3x3  matrices  P,  S  and  T  to  O,  respectively.  (In  particular,  R  can  be  the 
identity  matrix,  and  P,  S  two  rotations  producing  the  second  and  third  views.)  Let  P 
be  a  fourth  image  of  the  same  object  obtained  by  applying  a  different  3x3  matrix  U  to 
0.  Let  ri,  Si,  ti  and  Ui  be  the  first  row  vectors  of  P,  S,  T  and  [/,  respectively,  and  let 
r2,  s2,  t2  and  u2  be  the  second  row  vectors  of  P,  S,  T  and  U  respectively.  The  positions 
of  a  point  p  €  0  in  the  four  images  are  given  by: 


Pi  =  {xi,yi)  =  (riP,  r2p) 

P2  =  (z2,P2)  =  (SlP,  S2p) 

P3  =  {x3,y3)  =  (tip,  t2p) 

p  =  {z,y)  =  («ip,  u2p) 


Claim:  If  both  sets  {r2 , Si , ti }  and  {r2,s2,t2}  are  linearly  independent,  then  there 

exist  scalars  a\,  a2,  a3  and  bx,  bi ,  63  such  that  for  every  point  p  G  0  it  holds  that: 

x  =  aiXi  +  a2x2  +  CI3X3 
y  =  b\yx  +  biyi  +  b3y3 


Proof:  {r^Sijtj}  are  linearly  independent.  Therefore,  they  span  P3,  and  there  exist 

scalars  aj,  a2  and  a3  such  that: 


Ui  =  aiTi  +  a2Si  +  <23 1 1 


Since: 


x  =  Uip 
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It  follows  that: 


Therefore: 


x  =  aiTip  +  a2Sip  +  a3txp 


X  =  0\Xi  +  <22^2  "h  <23^3 


In  a  similar  way  we  obtain  that: 


V  =  thVi  +  &2J/2  +  &3J/3 


Therefore,  an  image  of  an  object  undergoing  a  linear  transformation  in  3-D  space  is 
a  linear  combination  of  three  model  images. 

1.3.3  General  Rotation  in  3-D  Space 

Rotation  is  a  nonlinear  subgroup  of  the  linear  transformations.  Therefore,  an  image  of 
a  rotating  object  is  still  a  linear  combination  of  three  model  images.  However,  not  every 
point  in  this  linear  space  represents  a  pure  rotation  of  the  object.  Indeed,  we  can  show 
that  only  points  that  satisfy  the  following  three  constraints  represent  images  of  a  rotating 
object. 

Claim:  The  coefficients  of  an  image  of  a  rotating  object  must  satisfy  the  three  following 

constraints: 

||  axT]  +  a2Sx  +  a3tx  ||  =  1 

II  ^lr2  +  ^2S2  +  ^3^2  II  =  1 
(axTi  +  a2Si  +  a3ti)(biT2  +  b2S2  +  k3t2)  =  0 

Proof:  U  is  a  rotation  matrix.  Therefore: 

l|U|||  =  1 
II  U2  II  =  1 

Ui  U2  =  0 

And  the  required  terms  are  obtained  directly  by  substituting  Ux  and  u2  with  the  appro¬ 
priate  linear  combinations.  It  also  follows  immediately  that  if  the  constraints  are  met, 
then  the  new  view  represents  a  possible  rotation  of  the  object. 

These  functional  constraints  are  second  degree  polynomials  in  the  coefficients,  and 
therefore  span  a  nonlinear  manifold  within  the  linear  subspace.  In  order  to  check  whether 
a  specific  set  of  coefficients  represents  a  rigid  rotation,  the  values  of  the  matrices  R,  S  and 
T  are  required.  These  can  be  retrived  by  applying  methods  of  “structure  from  motion” 
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to  the  model  views.  Ullman  [1979]  showed  that  in  case  of  rigid  transformations  four 
corresponding  points  in  three  views  are  sufficient.  A  linear  algorithm  that  can  be  used 
to  recover  the  rotation  matrices  has  been  suggested  by  Huang  &  Lee  [1989].  (The  same 
method  can  be  extended  to  deal  with  scale  changes,  in  addition  to  the  rotation.) 

It  should  be  noted  that  in  some  cases  the  explicit  computation  of  the  rotation  matrices 
will  not  be  necessary.  First,  if  the  set  of  allowable  object  transformations  includes  the 
entire  set  of  linear  3-D  transformations  (including  non-rigid  stretch  and  shear),  then 
no  additional  test  of  the  coefficients  is  required.  Second,  if  the  transformations  are 
constrained  to  be  rigid,  but  the  test  of  the  coefficient  is  not  performed,  then  the  penalty 
may  be  some  “false  positives”  misidentifications.  If  the  image  of  one  object  happens  to 
be  identical  to  the  projection  of  a  (non-linear)  rigid  transformation  applied  to  another 
object,  then  the  two  will  be  confuseable.  If  the  objects  contain  a  sufficient  number  of 
points  (five  or  more),  the  likelihood  of  such  an  ambiguity  becomes  negligible.  Finally,  it  is 
worth  noting  that  it  is  also  possible  to  determine  the  coefficient  of  the  constraint  equations 
above  without  computing  the  rotation  matrices,  by  using  a  number  of  additional  views 
(see  also  section  1.3.5). 

Regarding  the  independence  condition  mentioned  above,  for  many  triplets  of  rotation 
matrices  R,  S  and  T  both  {r^s^tj}  and  {r2,s2,t2}  will  in  fact  be  linearly  independent. 
It  will  therefore  be  possible  to  select  a  non  degenerate  triplet  of  views  (Pi,  P2  and  P3),  in 
terms  of  which  intermediate  views  are  expressible  as  linear  combinations.  Note,  however, 
that  in  the  special  case  that  R  is  the  identity  matrix,  S  is  a  pure  rotation  about  the 
X-axis,  and  T  about  the  K-axis,  the  independence  condition  does  not  hold. 


1.3.4  Rigid  Transformations  and  Scaling  in  3-D  Space 

Rotation,  translation  and  scaling  in  3-D  space  can  be  represented  as  linear  transforma¬ 
tions  in  4-D  space  using  homogenous  coordinates.  Therefore,  an  image  of  a  rigid  object 
can  be  expressed  as  the  linear  combination  of  four  model  images.  In  fact,  only  three 
different  snapshots  of  the  object  are  required,  the  fourth  view  can  be  derived  from  them. 

Let  O  be  a  set  of  object  points.  Let  P\,  P2  and  P3  be  three  images  of  O,  obtained  by 
applying  the  3x3  rotation  matrices  R ,  S  and  T  to  0,  respectively.  Let  P  be  a  fourth 
image  of  the  same  object  obtained  by  applying  a  3  x  3  rotation  matrix  U  to  0,  scaling 
by  a  scale  factor  s,  and  translating  by  a  vector  ( tx ,  ty).  Let  r1?  s1?  tj  and  Ui  be  again  the 
first  row  vectors  of  R,  S ,  T  and  U ,  and  r2,  s2,  t2  and  u2  the  second  row  vectors  of  R ,  S, 
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T  and  U,  respectively.  For  any  point  p  6  0,  its  positions  in  the  four  images  are  given  by: 


Pi  =  (xuyi)  =  (rip,  r2p) 

Pi  -  ( *2,2/2 )  =  (sip,  s2p) 

P3  =  (X3,P3)  =  (tip,  tap) 

P  =  (x,y)  =  (surf  +  tx,  su2p  +  ty) 

Claim:  If  both  sets  {ri,Si,ti}  and  {r2,S2,t2}  are  linearly  independent,  then  there 

exist  scalars  aj,  a2,  03,  a4,  and  61,  62,  63,  64,  such  that  for  every  point  p  6  0  it  holds  that: 

x  =  fli^i  +  Q?%i  +  03X3  +  a4 
V  —  biyi  +  +  63^3  +  b4 

with  the  coefficient  satisfying  the  two  constraints: 


||  fllfl  +  <22Sl  +  <23 tj  ||  —  ||  6iT2  +  b2S2  +  || 


(ajTi  4-  <i2si  +  Osti )  (b ir2  +  62s2  +  63t2)  =  0 

Proof:  {rj,Si,ti}  are  linearly  independent.  Therefore,  they  span  TZ3,  and  there  exist 

scalars  Cj,  c2  and  C3  such  that: 


Ui  =  c1rl  +  c2Si  +  c3ti 


Since: 

Then 

Let: 


x  =  s(u!p)  +  tx 

X  =  SC^p  +  SC2SjP  +  SC3tiP  4-  tx 


CL 1  — 

0,2  =  5C2 
a3  =  sc3 
O 4  t-x 


We  obtain  that: 


x  =  aiij  4-  a2x2  4-  03-^3  +  <*4 


In  a  similar  way  we  obtain  that: 


V  =  b\y\  4-  b2y2  +  &3j/3  +  fr4 


S 


U  is  rotation  matrix,  therefore: 


II  ux  ||  =  1 
l|ua||  =  1 
Ui  u2  =  0 

It  follows  that: 

II  SUj  II  =  II  su2  II 

(aui)(su2)  =  0 

And  the  constraints  are  obtained  directly  by  substituting  the  appropriate  linear  combi¬ 
nations  for  sux  and  su2. 

1.3.5  Using  Two  Views  Only 

In  the  scheme  described  above,  any  image  of  a  given  object  (within  a  certain  range  of 
rotations)  is  expressed  as  the  linear  combination  of  three  fixed  views  of  the  object.  For 
general  linear  transformations,  it  is  also  possible  to  use  instead  just  two  views  of  the 
object.  (This  observation  was  made  independently  by  T.  Poggio  and  R.  Basri.) 

Let  O  be  again  a  rigid  object  (a  collection  of  3-D  points).  Px  is  a  2-D  image  of  O, 
and  P2  the  image  of  0  following  a  rotation  by  R  (a  3  x  3  matrix).  We  will  denote  by  iq, 
r2,  r3,  the  three  rows  of  R,  and  by  ei,  e2,  e3,  the  three  rows  of  the  identity  matrix.  For 
a  given  3-D  point  p  in  0,  its  coordinates  (xj,j/i)  in  the  first  image  view  are  Xj  =  ejp, 
Vi  —  e2p.  Us  coordinates  (x2,y2)  in  the  second  view  are  given  by:  x2  =  rjp,  y2  =  r2p. 

Consider  now  any  other  view  obtained  by  applying  another  3x3  matrix  U  to  the 
points  of  0.  The  coordinates  (x,y)  of  p  in  this  new  view  will  be: 

x  =  Uj p,  y  =  u2p 

(where  u1?  u2,  are  the  first  and  second  rows  of  U,  respectively). 

Assuming  that  ej,  e2  and  ri  span  TV  (see  below),  then: 

Uj  =  aiei  +  a2e2  -f  a3ri 
for  some  scalars  ai,a2,a3.  Therefore: 

x  =  u ip  =  (axe i  +  a2e2  -(-  azrx) p  =  g1x1  +  a2yx  +  a3x2 


This  equality  holds  for  every  point  p  in  0 ,  Let  Xi  be  the  vector  of  all  the  x-coordinates 
of  the  points  in  the  first  view,  x2  in  the  second,  x  in  the  third,  and  yi  the  vector  of  y- 
coordinates  in  the  first  view.  Then: 

x  =  Gj  X]  +  a2yi  +  a3x2 
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Here  Xi,  yi  and  X2  are  used  as  a  basis  for  all  of  the  views.  For  any  other  image  of  the 
same  object,  its  vector  x  of  x-coordinates  is  the  linear  combination  of  these  basis  vectors. 

Similarly,  for  the  y-coordinates: 

y  =  61X1  +  hi  yi  +  63X2 

The  vector  y  of  y-coordinates  in  the  new  image  is  therefore  also  the  linear  combination 
of  the  same  three  basis  vectors.  In  this  version  the  basis  vectors  are  the  same  for  the 
x-  and  y-coordinates,  and  they  are  obtained  from  two  rather  then  three  views.  One  can 
view  the  situation  as  follows.  Within  an  n-dimensional  space,  the  vectors  Xi,  yi,  x2  span 
a  3-dimensional  subspace.  For  all  the  images  of  the  object  in  question,  the  vectors  of 
both  the  x-  and  y-coordinates  must  reside  within  this  3-dimensional  subspace. 

Instead  of  using  (e1,e2,r1)  as  the  basis  for  7iL3  we  could  also  use  (e!,e2,r2).  One  of 
these  bases  spans  7l3,  unless  the  rotation  R  is  a  pure  rotation  around  the  line  of  sight. 

The  use  of  two  views  described  above  is  applicable  to  general  linear  transformations 
of  the  object,  and,  without  additional  constraints,  it  is  impossible  to  distinguish  between 
rigid  and  linear  but  not  rigid  transformations  of  the  object.  To  impose  rigidity  (with 
possible  scaling)  the  coefficients  (ci,  <221  o3,  61,  62,  63)  must  meet  two  simple  constraints. 
Since  U  is  now  a  rotation  matrix  (with  possible  scaling), 

Uj  u2  =  0 

II  U1  II  =  II  “2  || 

In  terms  of  the  coefficients  a,-,  6,,  Ui  U2  =  0  implies: 

aj&i  +  U2^2  +  03^3  +  (ai&3  +  tt3^l)rll  +  (02^3  +  «3&2)r12  =  0 

The  second  constraint  implies: 

ai2  +  0.2  +  <z32  —  bi2  —  632  —  63  2  =  2(6i  63  —  aia3)rn  +  2(6363  —  Q2a3)ri2 

A  third  view  can  therefore  be  used  to  recover,  using  two  linear  equations,  the  values 
of  rn  and  r12.  (rn  and  r12  can  in  fact  be  determined  to  within  . a  scale  factor  from 
the  first  two  views,  only  one  additional  equation  is  required.)  The  full  scheme  for  rigid 
objects  is  then  the  following.  Given  an  image,  determine  whether  the  vectors  x,  y,  are 
linear  combinations  of  Xi,  yi  and  X2.  Only  two  views  are  required  for  this  stage.  Using 
the  values  of  rn  and  r12,  test  whether  the  coefficients  a,,  6,,  ( i  =  1,2,3)  satisfy  the  two 
constraints  above. 

It  is  of  interest  to  compare  this  use  of  two  views  to  structure-from-motion  (SFM) 
techniques  for  recovering  3-D  structure  from  orthographic  projections.  It  is  well  known 
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that  three  distinct  views  are  required,  two  are  insufficient  [Ullman  1979].  Given  only  two 
views  and  an  infinitesmal  rotation  (the  velocity  field),  the  3-D  structure  can  be  recovered 
to  within  depth-scaling  [Ullman  1983].  It  is  also  straightforward  to  establish  that  if  the 
two  views  are  separated  by  a  general  affine  transformation  of  the  3-D  object  (rather 
than  a  rigid  one),  then  the  structure  of  the  object  can  be  recovered  to  within  an  affine 
transformation. 

Our  use  of  two  views  above  for  the  purpose  of  recognition  is  thus  related  to  known 
results  regarding  the  recovery  of  structure  from  motion.  Two  views  are  sufficient  to 
determine  the  object’s  structure  to  within  an  affine  transformation,  and  three  are  required 
to  recover  the  full  3-D  structure  of  a  rigidly  moving  object.  It  can  also  be  observed  that 
an  extension  of  the  scheme  above  can  be  used  to  recover  structure  from  motion.  It  was 
shown  how  the  scheme  can  be  used  to  recover  rn  and  ri2.  r2i  and  r22  can  be  recovered  in 
a  similar  manner.  Consequently,  it  becomes  possible  to  recover  3-D  structure  and  motion 
in  space  based  on  three  orthographic  views,  using  linear  equations. 

1.3.6  Summary 

In  this  section  we  have  shown  that  an  object  with  sharp  contours,  undergoing  rigid 
transformations  and  scaling  in  3-D  space  followed  by  an  orthographic  projection,  can  be 
expressed  as  the  linear  combination  of  four  images  of  the  same  object.  In  this  scheme, 
the  model  of  a  3-D  object  consists  of  a  number  of  2-D  pictures  of  it.  The  pictures  are  in 
correspondence,  in  the  sense  that  it  is  known  which  are  the  corresponding  points  in  the 
different  pictures.  Two  images  are  sufficient  to  represent  general  linear  transformations 
of  the  object.  Three  images  are  required  to  represent  rotations  in  3-D  space,  and  one 
additional  image  is  required  to  represent  translations.  The  scaling  does  not  require  any 
additional  image,  since  it  is  represented  by  a  scaling  of  the  coefficients.  As  mentioned 
above,  the  fourth  picture  can  be  generated  internally,  therefore  only  three  different  snap¬ 
shots  of  the  object  are  required. 

The  linear  combination  scheme  assumes  that  the  same  object  points  are  visible  in 
the  different  views.  When  the  views  are  sufficiently  different,  this  will  no  longer  hold, 
due  to  self-occlusion.  To  represent  an  object  from  all  possible  viewing  directions  (e.g. 
both  “front”  and  “back”),  a  number  of  different  models  of  this  type  will  be  required. 
This  notion  is  similar  to  the  use  of  different  object  aspects  suggested  by  Ivoenderink  & 
Van  Doom  [1979].  (Other  aspects  of  occlusion  are  examined  in  the  final  discussion  and 
Appendix  D.) 

The  linear  combination  scheme  described  above  was  implemented  and  applied  first 
to  artificially  created  images.  Figure  2  shows  examples  of  object  models  and  their  linear 
combinations.  The  figure  shows  how  3-D  similarity  transformations  can  be  represented 
by  the  linear  combinations  of  four  images. 
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(d) 


Figure  2:  (a)  Three  model  pictures  of  a  cube.  The  second  picture  was  obtained  by  rotating 
the  cube  by  30°  around  the  A-axis,  then  by  30°  around  the  F-axis.  The  third  picture  was 
obtained  by  rotating  the  cube  by  30°  around  the  F-axis,  then  by  30°  around  the  A'-axis.  (b) 
Three  model  pictures  of  a  pyramid  taken  with  the  same  transformations  as  the  pictures  in 
(a),  (c)  Two  linear  combinations  of  the  cube  model.  The  left  picture  was  obtained  using  the 
following  parameters:  the  .^-coefficients  are  (0.343,  -2.618,2.989,0),  and  the  ^-coefficients  are 
(0.630,  -2.533, 2.658, 0),  which  correspond  to  a  rotation  of  the  cube  by  10°,  20°  and  45°  around 
the  X- ,  F-  and  Z- axes  respectively.  The  right  picture  was  obtained  using  the  following  pa¬ 
rameters:  x-coefficients  (0.455,3.392,-3.241,0.25),  y-coefficients  (0.542,3.753,-3.343,-0.15). 
These  coefficients  correspond  to  a  rotation  of  the  cube  by  20°,  10°  and  -45°  around  the  X-, 
F-  and  Z- axes  respectively,  followed  by  a  scaling  of  factor  1.2,  and  a  translation  of  (25,-15) 
pixels,  (d)  Two  linear  combinations  of  the  pyramid  model  taken  with  the  same  parameters  as 
the  pictures  in  (c). 
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1.4  Objects  with  Smooth  Boundaries 


The  case  of  objects  with  smooth  boundaries  is  identical  to  the  case  of  objects  with  sharp 
edges  as  long  as  we  deal  with  translation,  scaling  and  image  rotation.  The  difference 
arises  when  the  object  rotates  in  3-D  space.  This  case  is  discussed  in  [Basri  h  Ullman, 
1988],  where  we  have  suggested  a  method  for  predicting  the  appearance  of  such  objects 
following  3-D  rotations.  This  method,  called  “the  curvature  method”,  is  summarized 
briefly  below. 

A  model  is  represented  by  a  set  of  2-D  contours.  Each  point  p  =  (x,y)  along  the 
contours  is  labeled  with  its  depth  value  z,  and  a  curvature  value  r.  The  curvature  value 
is  the  length  of  a  curvature  vector  at  p,  r  =||  (rr,rv)  |[.  ( rx  is  the  surface’s  radius  of 
curvature  at  p  in  a  planar  section  in  the  X  direction,  ry  in  the  Y  direction.)  This  vector 
is  normal  to  the  contour  at  p.  Let  V+  be  an  axis  lying  in  the  image  plane  and  forming  an 
angle  (f>  with  the  positive  X  direction,  and  be  a  vector  of  length  =  ry  cos  </)  —  rx  sin  <f> 
and  perpendicular  to  V^.  When  the  object  is  rotated  around  V#  we  approximate  the  new 
position  of  the  point  p  in  the  image  by: 

p'  =  R{p  -  r*)  +  r*  (1) 

where  R  is  the  rotation  matrix.  The  equation  has  the  following  meaning.  When  viewed 
in  a  cross  section  perpendicular  to  the  rotation  axis  V^,  the  surface  at  p  can  be  approx¬ 
imated  by  a  circular  arc  with  radius  r ^  and  center  at  p  —  r^.  The  new  rim  point  p'  is 
obtained  by  first  applying  R  to  this  center  of  curvature  (p  —  r^,),  then  adding  the  radius 
of  curvature  r^,.  This  expression  is  precise  for  circular  arcs,  and  gives  a  good  approxima¬ 
tion  for  other  surfaces  provided  that  the  angle  of  rotation  is  not  too  large  (see  [Basri  & 
Ullman  1988]  for  details).  The  depth  and  the  curvature  values  were  estimated  in  [Basri 
&  Ullman  1988]  using  three  pictures  of  the  object,  and  the  results  were  improved  using 
five  pictures.  In  this  section  we  show  how  the  curvature  method  can  also  be  replaced  by 
linear  combinations  of  a  small  number  of  pictures.  In  particular,  we  use  three  images  to 
represent  rotations  around  the  vertical  axis,  and  five  images  for  general  rotations  in  3-D 
space. 


1.4.1  3-D  Rotation  Around  the  Vertical  Axis 

When  an  object  rotates  around  the  vertical  (V)  axis  by  an  angle  9,  in  equation  (1) 
above  becomes  rs,  which  is  a  horizontal  vector  of  length  ri  =  rx.  Therefore,  the  new 
position  of  a  point  p  =  ( x,y )  is  given  by  p'  =  (x',y')  where: 

x'  =  (x  —  rx)  cos  9  +  2  sin  0  +  rx  =  x  cos  9  +  z  sin  9  +  rr(l  —  cos  9) 

y'  =  y 
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This  expression  gives  the  new  coordinates  (x',y')  in  terms  of  the  original  coordinates 
(x,t/),  the  rotation  angel  9 ,  the  local  depth  z  and  the  radius  of  curvature  rx.  Next  we 
show  that  the  new  image  can  be  expressed  instead  as  the  linear  combination  of  three  2-D 
images. 

Let  Pi,  P2  and  P3  be  three  images  of  an  object  0  rotating  around  the  vertical  (F) 
axis.  P2  is  obtained  from  Pi  by  a  rotation  by  an  angle  a,  and  P3  by  a  rotation  by  an 
angle  /3  (a  ^  /?,  a,  (3  ^  kir).  Let  P  be  another  image  of  the  same  object  obtained  from 
Pi  by  a  rotation  by  an  angle  9  around  the  vertical  axis.  We  assume  that  the  curvature 
scheme  gives  sufficiently  close  approximation  to  the  images.  Under  this  assumption,  the 
positions  of  a  point  p  =  (x,  y,z)  €  0  can  be  expressed  in  the  following  manner: 


=  (*i,yi)  = 

=  (*2,2/2)  = 
=  (*3,1/3)  = 
=  (x,y)  = 


(*.y)  g  Pi 

(xcosa  4-  2  sin  a  +  rx(l  —  cos  a),  y)  €  P2 
(xcos/3  +  zs\n(3  +  rx(l  -  cos/3),  y)  e  P3 
(xcosO  -f  zsin#  -f  rx(l  —  cos#),  y)  6  P 


Claim:  P  is  a  linear  combination  of  Pi,  P2,  P3.  That  is,  there  exist  scalars  a,  b  and  c 

such  that  for  every  four  corresponding  points  P\,p2,P3,p: 


with: 


x  =  axi  -f  6x2  +  cx3 


o,  -f-  b  -(■  c  —  1 


a2  +  b2  +  c2  +  2 ab  cos  a  +  2ac  cos  +  2 be  cos{(3  —  a)  =  1 

Proof:  We  construct  a,  b  and  c  explicitly.  Let: 

sin(a  —  9)  —  sin(/3  —  9)  —  sin(a  —  /?) 

d  =  ■■  —  "7  -  - 

sin  a  —  sin  (3  —  sin(a  —  (3) 

,  —  sin  f3  +  sin  9  +  s\n((3  —  9) 

b  =  — : - : — - - ; — - - — 

sin  a  —  sin  p  —  sin(a  —  p) 

sin  a  —  sin  9  —  sin(a  —  9) 

£  2;  — 
sin  a  —  sin  (3  —  sin(a  —  (3) 

(a  ^  (3  and  a,  (3  kn  implies  that  sin  a  —  sin  (3  —  sin(a  —  [3)  ^  0).  It  follows  that: 

axi  +  bx2  4-  cx3  = 

sin(a  —  9)  —  sin  ((3  —  9)  —  sin(a  —  (3) 

—  ■  ■  i  I,,  jp  _|_ 

sin  a  —  sin/?  —  sin(o  —  (3) 
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—  sin  0  +  sin  9  +  sin(/3  —  9) 


sin  a  —  sin  0  —  sin(a  —  0) 
sin  a  —  sin  8  —  sin(a  —  9) 


(z  cos  a  +  z  sin  a  +  rx(l  —  cos  a))+ 


(z  cos  0  4-  z  sin  0  rx(l  —  cos  0))  = 


sin  a  —  sin  0  —  sin(a  —  0) 

=  (zcos#  +  zsin#  +  rx(l  —  cos#))  =  z 

Therefore,  an  image  of  an  object  rotating  around  the  vertical  axis  and  described  accu¬ 
rately  by  the  curvature  method  is  always  a  linear  combination  of  three  model  images.  In 
addition,  if  we  substitute  the  values  above  for  a,  b  and  c  in  the  two  functional  constraints 
we  obtain  that: 

a  +  b  +  c  =  1 

a2  +  b2  +  c2  +  2ab  cos  a  -|-  2ac  cos  0  +  2 be  cos (0  —  a)  =  1 


1.4.2  General  Rotation  in  3-D  Space 

In  this  section  we  first  derive  an  expression  for  the  image  deformation  of  an  object  with 
smooth  boundries  under  general  3-D  rotation.  We  then  use  this  expression  to  show  that 
the  deformed  image  can  be  expressed  as  the  linear  combination  of  five  images. 

Computing  the  transformed  image. 

Using  the  curvature  method  we  can  predict  the  appearance  of  an  object  undergoing  a 
general  rotation  in  3-D  space  as  follows.  A  rotation  in  3-D  space  can  be  decomposed 
into  the  following  three  successive  rotations:  a  rotation  around  the  Z-axis,  a  subsequent 
rotation  around  the  X  axis,  and  a  final  rotation  around  the  Z-axis,  by  angles  a,  fd  and 
7  respectively.  Since  the  Z-axis  coincides  with  the  line  of  sight,  a  rotation  around  the 
Z-axis  is  simply  an  image  rotation.  Therefore,  only  the  second  rotation  deforms  the 
object,  and  the  curvature  method  must  be  applied  to  it.  Suppose  that  the  curvature 
vector  at  a  given  point  p  =  ( x,y )  before  the  first  Z-rotation  is  (rx,ry).  Following  the 
rotation  by  a  it  becomes  r'x  =  rx  cos  a  —  ry  sin  a  and  r'y  =  rr  sin  a  +  ry  cos  a.  The  second 
rotation  is  around  the  X-axis,  and  therefore  the  appropriate  r ^  to  be  used  in  eq.  (1) 
becomes  r'y  =  rxsina  4-rycosa.  The  complete  rotation  (ail  three  rotations)  therefore 
takes  a  point  p  =  ( x,y )  through  the  following  sequence  of  transformations: 

(z,  y)  — »  (zcos  ot  —  y  sin  o,  xsina  +  y  cos  a)  — » 

(z  cos  a  —  y  sin  a,  (z  sin  at  +  y  cos  a)  cos  0  —  z  sin  0  4-  (rx  sin  a  +  ry  cos  a)(  1  —  cos  0))  — ♦ 

((x  cos  a— y  sin  a)  cos  7+((x  sin  a+y  cos  a)  cos  /?— z  sin  /?+(rx  sin  Q+ry  cos  a)(l  —  cos  /?))  sin  7, 
(x  cos  a—y  sin  a)  sin  7+((x  sin  a+y  cos  a)  cos  0— z  sin  0+{rx  sin  a+ry  cos  a)(l  —  cos  0))  cos  7) 
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(The  first  of  these  transformations  is  the  first  Z-rotation,  the  second  is  the  deformation 
caused  by  the  X-rotation,  and  the  third  is  the  final  Z-rotation). 


This  is  an  explicit  expression  of  the  final  coordinates  of  a  point  on  the  object’s  contour. 
This  can  also  be  expressed  more  compactly  as  follows.  Let  R  =  {rtJ}  be  a  3  x  3  rotation 
matrix.  Let  a,  and  7  be  the  angles  of  the  Z-X-Z  rotations  represented  by  R.  We 
construct  a  new  matrix  R'  =  {r^}  of  size  2  x  5  as  follows: 

_  f  ru  r12  r13  sin  a(l  —  cos  /?)  sin 7  cos  a(l  —  cos  /3)  sin  7  \ 

—  \r21  r22  r23  sina(l  —  cos/3)  cos 7  cosa(l  —  cos/?)  cos  7/ 


Let  p  =  (x,  y)  be  a  contour  point  with  depth  z  and  curvature  vector  (rx,  ry),  and  let 
p  =  (x,y,z,rx,ry).  Then,  the  new  appearance  of  p  after  a  rotation  R  is  applied  to  the 
object  is  described  by: 

p'  =  R’p  (2) 


This  is  true  because  eq.  (2)  is  equivalent  to  eq.  (1)  in  section  1.4  with  the  appropriate 
values  for  r^. 


Expressing  the  transformed  image  as  a  linear  combination. 

Let  O  be  a  set  of  points  of  an  object  rotating  in  3-D  space.  Let  Pi,  P2,  P3,  P4  and 
P5  be  five  images  of  O ,  obtained  by  applying  a  rotation  matrix  Ri,...,R5  respectively. 
P  is  an  image  of  the  same  object  obtained  by  applying  a  rotation  matrix  R  to  O.  Let 
R\, ...,  f?'5,  R'  be  the  corresponding  2x5  matrices  representing  the  transformations  applied 
to  the  contour  points  according  to  the  curvature  method.  Finally,  let  ri,  ...,rs,r  denote 
the  first  row  vectors  of  R\, ...,  R's,  R',  and  sx, ...,  s5,  s  the  second  row  vectors  R[, ...,  R'b,  R! 
respectively.  The  positions  of  a  point  p  =  (x,y)  €  0,  p  =  (x,y,z,rx,ry),  in  the  six 
pictures  is  then  given  by: 

Pi  =  ( Xi,yi )  =  (r,p,  Sip)  €  Pi,  l<i<5 

P  -  (x,y)  =  (fp,  sp)  6  P 


Claim:  If  both  sets 

there  exist  scalars  ax, .. 


{rj,...,r5}  and  {sx,..., s5}  are  linearly  independent  vectors  then 
.,a5  and  , ...,  65  such  that  for  every  point  p  €  0  it  holds  that: 


x 


y 


5 

Yl  O-iXi 
1=1 

H  b>yi 

1=1 


Proof:  {ri,...,rs}  are  linearly  independent.  Therefore,  they  span  TZ5,  and  there  exist 

scalars  ax,..., a5  such  that: 

5 

r  =  a'r' 

1=1 
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Since: 


x  =  r  p 


Then: 

5 

X  =  J2  OiTip 
«=i 

That  is: 

5 

X  —  )  "  dtlj 
«=1 

In  a  similar  way  we  obtain  that: 

y  =  5Z 

«=i 

In  addition,  for  pure  rotation,  the  coefficients  of  this  linear  combinations  satisfy  seven 
functional  constraints.  These  constraints,  which  are  second  degree  polynomials,  are  given 
in  Appendix  A. 

Again,  one  may  or  may  not  actually  test  for  these  additional  constraints.  If  the  test 
is  ommitted,  the  probablity  of  a  false- positive  misidentification  is  slightly  increased. 

As  in  the  case  of  sharp  boundaries,  it  is  possible  to  use  mixed  x-  and  y-coordinates  to 
reduce  the  number  of  basic  views  for  genral  linear  transformations  (Section  1.3.5).  For 
example,  one  can  use  five  basis  vectors  (Xi,X2,x3,yi,y2)  taken  from  three  distict  views 
as  the  basis  for  the  x-  and  y-coordinates  in  all  other  views. 

1.4.3  Rigid  Transformation  and  Scaling  in  3-D  Space 

So  far  we  have  shown  that  an  object  with  smooth  boundaries,  represented  by  the  cur¬ 
vature  scheme,  and  undergoing  a  rotation  in  3-D  space,  can  be  represented  as  a  linear 
combination  of  2-D  views.  The  method  can  be  easily  extended  to  handle  translation  by 
taking,  as  before,  an  additional  image  of  the  object.  The  linear  combination  scheme  for 
objects  with  smooth  bounding  contours  is  thus  a  direct  extension  of  the  scheme  in  section 

1.3  for  objects  with  sharp  boundaries.  In  both  cases,  object  views  are  expressed  as  the 
linear  combination  of  a  small  number  of  pictures.  The  scheme  for  objects  with  sharp 
boundaries  can  be  viewed  as  a  special  case  of  the  more  general  one,  when  r,  the  radius 
of  curvature,  vanishes.  In  practice,  we  found  that  it  is  also  possible  to  use  the  scheme  for 
sharp  boundaries,  that  uses  a  smaller  number  of  views  in  each  model,  for  general  objects, 
provided  that  r  is  not  too  large  (and  at  the  price  of  increasing  the  number  of  models). 
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1.4.4  Summary 


In  this  section  we  have  shown  that  an  object  with  smooth  boundaries  undergoing  rigid 
transformations  and  scaling  in  3-D  space  followed  by  an  orthographic  projection,  can  be 
expressed  (within  the  approximation  of  the  curvature  method)  as  the  linear  combination 
of  six  images  of  the  object.  Five  images  are  used  to  represent  rotations  in  3-D  space, 
and  one  additional  image  is  required  to  represent  translations.  (In  fact,  although  the 
coordinates  are  expiessed  in  terms  of  five  basis  vectors,  only  three  distinct  views  are 
needed  for  a  general  linear  transformation.)  The  scaling  does  not  require  any  additional 
image  since  it  is  represented  by  a  scaling  of  the  coefficients.  This  scheme  was  implemented 
and  applied  to  images  of  3-D  objects. 

Figures  3  and  4  show  the  application  of  the  LC  (linear  combination)  method  to  com¬ 
plex  objects  with  smooth  bounding  contours.  Since  the  rotation  was  about  the  vertical 
axis,  three  2-D  views  were  used  for  each  model.  The  figure  shows  a  good  agreement 
between  the  actual  image  and  the  appropriate  linear  combination.  Although  the  objects 
are  similar,  they  are  easily  discriminable  by  the  LC  method  within  the  entire  60°  rotation 
range. 

Finally,  it  is  worth  noting  that  the  modeling  of  objects  by  linear  combinations  of 
stored  pictures  is  not  limited  only  to  rigid  objects.  The  method  can  also  be  used  to 
deal  with  various  types  of  non-rigid  transformations,  such  as  articulations  and  non-rigid 
stretching.  For  example,  in  the  case  of  an  articulated  object,  the  object  is  composed  of  a 
number  of  rigid  parts  linked  together  by  joints  that  constraint  the  relative  movement  of 
the  parts.  We  saw  that  the  x-  and  y-coordinates  of  a  rigid  part  are  constrained  to  a  4-D 
subspace.  Two  rigid  parts  reside  within  an  8-D  subspace,  but,  because  of  the  constraints 
at  the  joints,  they  usually  occupy  a  smaller  subspace  (e.g.,  6-D  for  a  planar  joint). 
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Figure  3:  (a)  Three  model  pictures  of  a  VW  car  for  rotations  around  the  vertical  axis.  The  sec¬ 
ond  and  the  third  pictures  were  obtained  from  the  first  by  rotations  of  ±30°  around  the  Y -axis, 
(b)  Two  linear  combinations  of  the  VW  model.  The  z-coefficients  are  (0.556,0.463,-0.018) 
and  (0.582,-0.065,0.483)  which  correspond  to  a  rotation  of  the  first  model  picture  by  ±15°. 
These  are  artificial  images,  created  by  linear  combinations  of  the  first  three  views,  rather  than 
actual  views,  (c)  Real  images  of  a  VW  car.  (d)  Matching  the  linear  combinations  to  the  real 
images.  Each  contour  image  is  a  linear  combination  super-imposed  on  the  actual  image.  The 
agreement  is  good  within  the  entire  range  of  ±30°.  (e)  Matching  the  VW  model  to  pictures  of 
the  Saab  car. 
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(e) 


Figure  4:  (a)  Three  model  pictures  of  a  Saab  car  taken  with  approximately  the  same  trans¬ 
formations  as  the  VW  model  pictures,  (b)  Two  linear  combinations  of  the  Saab  model.  The 
r-coefficients  are  (0.601,0.471,-0.072)  and  (0.754,-0.129.0.375)  which  correspond  to  a  rota¬ 
tion  of  the  first  model  picture  by  ±15°.  (c)  Real  images  of  a  Saab  car.  (d)  Matching  the  linear 
combinations  to  the  real  images,  (e)  Matching  the  Saab  model  to  pictures  of  the  VW  car. 
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2  Determining  the  Alignment  Coefficients 


In  the  previous  sections  we  have  shown  that  the  set  of  possible  views  of  an  object  can 
often  be  expressed  as  the  linear  combination  of  a  small  number  of  views.  In  this  sec¬ 
tion  we  examine  the  problem  of  determining  the  transformation  between  a  model  and  a 
viewed  object.  The  model  is  given  in  this  scheme  as  a  set  of  k  corresponding  2-D  images 
{Mi, ...,  Mfc}.  A  viewed  object  P  is  an  instance  of  this  model  if  there  exists  a  set  of 
coefficients  {aj, ...,  a^}  (with  a  possible  set  of  restrictions  F(a\, a^)  =  0)  such  that: 

P  =  aiM\  +  ...  +  UfcA/fc  (3) 

In  practice  we  may  not  obtain  a  strict  equality.  We  will  attempt  to  minimize,  therefore, 
the  difference  between  P  and  axMi  + ...  4-  ctfe-M*..  The  problem  we  face  is  how  to  determine 
the  coefficients  {a!,..., a*}.  In  the  following  subsections  we  will  discuss  three  alternative 
methods  for  approaching  this  problem. 

2.1  Minimal  Alignment:  Using  a  Small  Number  of  Corre¬ 
sponding  Features 

The  coefficients  of  the  linear  combination  that  align  the  model  to  the  image  can  be  deter¬ 
mined  using  a  small  number  of  features,  identified  in  both  the  model  and  the  image  to  be 
recognized.  This  is  similar  to  previous  work  in  the  framework  of  the  alignment  approach 
[Fishier  k  Bolles  1981,  Huttenlocher  k  UUman  1987,  Lowe  1985,  Ullman  1986,1989].  It 
has  been  shown  that  three  corresponding  points  or  lines  are  usually  sufficient  to  deter¬ 
mine  the  transformation  that  aligns  a  3-D  model  to  a  2-D  image  [Ullman  1986,1989, 
Huttenlocher  k  Ullman  1987,  Shoham  k  Ullman  1988],  assuming  the  object  can  undergo 
only  rigid  transformations  and  uniform  scaling.  In  previous  methods,  3-D  models  of  the 
object  were  stored.  The  corresponding  features  (lines  and  points)  were  then  used  to 
recover  the  3-D  transformation  separating  the  viewed  object  from  the  stored  model. 

The  coefficients  of  the  linear  combination  required  to  align  the  model  views  with  the 
image  can  be  derived  in  principle,  as  in  previous  methods,  by  first  recovering  the  3-D 
transformations.  They  can  also  be  derived  directly,  however,  by  simply  solving  a  set  of 
linear  equations.  This  method  requires  k  points  to  align  a  model  of  k  pictures  to  a  given 
image.  Therefore,  four  points  are  required  to  determine  the  transformation  for  objects 
with  sharp  edges,  and  six  points  for  objects  with  smooth  boundaries.  In  this  way  we 
can  deal  with  any  transformation  that  can  be  approximated  by  linear  combinations  of 
pictures,  without  recovering  the  3-D  transformations  explicitly. 

The  coefficients  of  the  linear  conbination  are  determined  by  solving  the  following 
equations.  We  assume  that  a  small  number  of  corresponding  points  (the  “alignment 
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points”)  have  been  identified  in  the  image  and  the  model.  Let  X  be  the  matrix  of  the 
x-coordinates  of  the  alignment  points  in  the  model.  That  is,  x,j  is  the  x-coordinates  of 
the  j’th  point  in  the  i’th  model-picture.  px  is  the  vector  of  x-coordinates  of  the  alignment 
points  in  the  image,  and  a  is  the  vector  of  unknown  alignment  parameters.  The  linear 
system  to  be  solved  is  then  Xa.  =  p„.  The  alignment  parameters  are  given  by  a  =  X_lpx 
if  an  exact  solution  exists.  We  may  use  an  overdetermined  system  (by  using  additional 
points),  in  which  case  a  =  X+px  (where  X+  denotes  the  pseudo- inverse  of  X).  The 
matrix  X+  does  not  depend  on  the  image  and  can  be  pre-computed  for  the  model.  The 
recovery  of  the  coefficients  therefore  requries  only  a  multiplication  of  pr  by  a  known 
matrix.  Similarly,  we  solve  for  Kb  =  py  to  extract  the  alignment  parameters  b  in  the 
y-direction  from  Y  (the  matrix  of  y-coordinates  in  the  model),  and  py  (the  corresponding 
y-coordinates  in  the  image). 

It  is  also  worth  noting  that  the  computation  can  proceed  in  a  similar  fashion  on  the 
basis  of  correspondence  between  straight  line  segments  rather  than  points.  In  this  case, 
due  to  the  “aperture  problem”  [Marr  &  Ullman  1981],  only  the  perpendicular  component 
(to  the  contour)  of  the  displacement  can  be  measured.  This  component  can  be  used, 
however,  in  the  equations  above.  In  this  case  each  contour  segment  contributes  a  single 
equation  (as  opposed  to  a  point  correspondence,  that  gives  two  equations). 

One  question  that  may  arise  in  this  context  is  whether  the  visual  system  can  be 
expected  to  extract  reliably  a  sufficient  number  of  alignment  features.  Two  comments 
are  noteworthy.  First,  this  difficulty  is  not  specific  to  the  linear  combination  scheme, 
but  applies  to  other  alignment  schemes  as  well.  Second,  although  the  task  is  not  simple, 
the  phenomenon  of  apparent  motion  suggests  that  mechanisms  for  establishing  feature 
correspondence  do  in  fact  exist  in  the  visual  system. 

It  is  interesting  to  note  in  this  regard  that  the  correspondence  established  during 
apparent  motion  appears  to  provide  sufficient  information  for  the  purpose  of  recognition 
by  linear  combinations.  For  example,  when  the  car  pictures  in  figure  5(a)  are  shown 
in  apparent  motion,  the  points  marked  in  the  left  picture  appear  perceptually  to  move 
and  match  the  corresponding  points  marked  in  the  right  picture.  These  points,  with 
the  perceptually  established  match,  were  used  to  align  the  model  and  images  in  figure 
5.  That  is,  the  coordinates  of  these  points  were  used  in  the  equations  above  to  recover 
the  alignment  coefficients.  The  model  contained  six  pictures  of  a  Saab  car  in  order  to 
cover  all  rigid  transformations  for  an  object  with  smooth  boundaries.  As  can  be  seen, 
a  close  agreement  was  obtained  between  the  image  and  the  transformed  model.  (The 
modei  contained  only  a  subset  of  the  contours,  the  ones  that  were  clearly  visible  in  all  of 
the  different  pictures.) 
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Figure  5:  Aligning  a  model  to  images  using  corresponding  features,  (a)  Two  images  of  a  Saab 
car,  and  one  of  the  six  model  pictures,  (b)  The  corresponding  points  used  to  align  the  model 
to  the  images.  The  correspondence  was  determined  using  apparent  motion,  as  explained  in  the 
text,  (c)  The  transformed  model,  (d)  The  transformed  model  super-imposed  on  the  original 
images. 
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2.2  Searching  for  the  Coefficients 


An  alternative  method  to  determine  the  best  linear  combination  is  by  a  search  in  the 
space  of  possible  coefficients.  In  this  method  we  choose  some  initial  values  for  the  set 
{ai, afc}  of  coefficients,  then  we  apply  a  linear  combination  to  the  model  using  this  set 
of  coefficients.  We  repeat  this  process  using  a  different  set  of  coefficients,  and  take  the 
coefficient  values  that  produced  the  best  match  of  the  model  to  the  image. 

The  most  problematic  aspect  of  this  method  is  that  the  domain  of  coefficients  might 
be  large,  therefore  the  search  might  be  prohibitive.  We  can  reduce  the  search  space 
by  first  performing  a  rough  alignment  of  the  model  to  the  image.  The  identification  of 
general  features  in  both  the  image  and  the  model,  such  as  a  dominant  orientation,  the 
center  of  gravity,  and  a  measurement  of  the  overall  size  of  the  imaged  object,  can  be  used 
for  compensating  roughly  for  image  rotation,  translation  and  scaling.  Assuming  that 
this  process  compensates  for  these  transformations  up  to  a  bounded  error,  and  that  the 
rotations  in  3-D  space  covered  by  the  model  are  also  restricted,  then  we  could  restrict  the 
search  for  the  best  coefficients  to  a  limited  domain.  Moreover,  the  search  can  b<_  guided 
by  an  optimization  procedure.  We  can  define  an  error  measure  (for  instance,  the  area 
enclosed  between  the  transformed  model  and  the  image)  that  must  be  minimized,  and  use 
minimization  techniques  such  as  gradient  descent  to  make  the  search  more  efficient.  The 
preliminary  stage  of  rough  alignment  may  help  preventing  such  methods  from  reaching 
a  local  minimum  instead  of  the  global  one. 

2.3  Linear  Mappings 

The  linear  combination  scheme  is  based  on  the  fact  that  a  3-D  object  can  be  modeled  by 
the  linear  combination  of  a  small  number  ol  pictures,  ^hat  is,  the  set  of  possible  views  of 
an  object  is  embedded  in  a  linear  space  of  a  low  dimensionality.  We  can  use  this  property 
to  construct  a  linear  operator  that  maps  each  member  of  such  a  space  to  a  predefined 
vector,  which  identifies  the  object.  This  method  is  different  from  the  previous  two  in 
that  we  do  not  recover  explicitly  the  coenfficients  (aj,...,afc)  of  the  linear  combination. 
Instead,  we  assume  that  a  full  correspondence  has  been  established  between  the  viewed 
object  and  the  stored  model.  We  then  use  a  linear  mapping  to  test  wether  the  viewed 
object  is  a  linear  combination  of  the  model  views. 

Suppose  that  a  pattern  P  is  represented  by  a  vector  p  of  its  coordinates  (e.g., 
(xi,  r/i ,  12, 2/21  •■■■>  xn,  Un))-  Let  Pi  and  P2  be  two  different  patterns  representing  the  same 
object.  We  can  now  construct  a  matrix  L  that  maps  both  pi  and  P2  to  the  same  output 
vector  q.  That  is  Lpi  =  Lp2  =  q.  Any  linear  combination  api  -I-  6p2  will  then  be 
mapped  to  the  same  output  vector  q,  multiplied  by  the  scalar  a  +  b.  We  can  choose,  for 
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example,  q  =  plt  in  which  case  any  view  of  the  object  will  be  mapped  by  L  to  a  selected 
“canonical  view”  of  it. 

We  have  seen  above  that  different  views  of  the  same  object  can  usually  be  expressed  as 
linear  combinations  2ZfliPi  of  a  small  number  of  representative  views,  P,.  If  the  mapping 
matrix  L  is  constructed  in  such  a  manner  that  Lpj  =  q  for  all  the  views  P,  in  the  same 
model,  then  any  combined  view  p  =  £  a;pj,  will  be  mapped  by  L  to  the  same  q  (up  to 
a  scale),  since  Lp  =  (]T)a,)q. 

L  can  be  constructed  as  follows.  Let  {pi, ...,  pit}  be  k  linearly  independent  vectors 
representing  the  model  pictures  (we  can  assume  that  they  are  all  linearly  independent 
since  a  picture  that  is  not  is  obviously  redundant).  Let  {p*+1, ...,  p„}  be  a  set  of  vectors 
such  that  {pi,...,pn}  are  all  linearly  independent.  We  define  the  following  matrices: 


P  —  (Pl»  •••!  Pfc?  Pfc+1>  •••>  Pn) 

Q  =  (q,...,q,pfc+i,...,p„) 


We  require  that: 


LP  =  Q 


Therefore: 

L  =  QP-1 


Note  that  since  P  is  composed  of  n  linearly  independent  vectors,  the  inverse  matrix  P  1 
exists,  therefore  L  can  always  be  constructed. 


By  this  definition  we  obtain  a  matrix  L  that  maps  any  linear  combination  of  the  set  of 
vectors  {pj,...,pfc}  to  a  scaled  pattern  aq.  Furthermore,  it  maps  any  vector  orthogonal 
to  {pi,...,pfc}  to  itself.  Therefore,  if  p  is  a  linear  combination  of  {pi,...,Pit}  with  an 
additional  orthogonal  noise  component,  it  would  be  mapped  by  L  to  q  combined  with 
the  same  amount  of  noise. 


In  constructing  the  matrix  L,  one  may  use  more  than  just  k  vectors  p,,  particularly  if 
the  input  data  is  noisy.  In  this  case  a  problem  arises  of  estimating  the  best  k-dimensional 
linear  subspace  spanned  by  a  larger  collection  of  vectors.  This  problem  is  treated  in 
Appendix  B. 

In  our  implementation  we  have  used  Lp,  =  0  for  all  the  view  vectors  p,  of  a  given 
object.  The  reason  is  that  if  a  new  view  of  the  object  p  is  given  by  a, p,  with  £  a,  =  0, 
then  Lp  =  0.  This  means  that  the  linear  mapping  L  may  send  a  legal  view  to  the  zero 
vector,  and  it  is  therefore  convenient  to  choose  the  zero  vector  as  the  common  output  for 
all  the  object’s  views.  If  it  is  desirable  to  obtain  at  the  output  level  a  canonical  view  of 
the  object  such  as  pi  rather  than  the  zero  vector,  then  one  can  use  as  the  final  output 
the  vector  pt  —  Lp 
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The  decision  regarding  whether  or  not  p  is  a  view  of  the  object  represented  by  L  can 
be  based  on  comparing  ||  Lp  ||  with  j|  p  ||.  If  p  is  indeed  a  view  of  the  object,  then  this 
ratio  will  be  small  (exactly  0  in  the  noise  free  condition).  If  the  view  is  “pure  noise”  (in 
the  space  orthogonal  to  the  span  of  (pi,...pfc)),  then  this  ratio  will  be  equal  to  1. 

The  general  idea  is  somewhat  similar  to  the  associative  mappings  presented  in  [Koho- 
nen,  Oja  &  Lehtio  1981].  However,  in  our  scheme,  unlike  the  one  presented  by  Kohonen, 
Oja  &  Lehtio  [1981],  we  take  advantage  of  the  fact  that  intermediate  views  of  3-D  objects 
can  be  expressed  as  the  linear  combination  of  model  views.  Our  scheme  therefore  uses 
the  coordinates  of  image  contours,  rather  than  the  image  intensity  values. 

Figure  6  shows  the  application  of  the  linear  mapping  to  two  models  of  simple  geo¬ 
metrical  structures,  a  cube  (a)  and  a  pyramid  (b).  For  each  model  we  have  constructed  a 
matrix  that  maps  any  linear  combination  of  the  model  pictures  to  the  first  picture  of  the 
model.  The  matrices  were  applied  to  images  (c)  and  (e),  and  the  results  are  presented  in 
(d)  and  (f). 

2.4  The  Use  of  Linear  Receptive  Fields 

Two  of  the  three  methods  above  are  correspondence-based.  They  require  the  identifica¬ 
tion  of  corresponding  features  in  the  model  and  the  image  to  be  recognized  to  recover 
the  coefficients  or  to  apply  the  linear  mapping.  In  this  section  we  suggest  a  method  that 
may  be  used  (along  with  some  other  methods)  to  alleviate  to  some  degree  the  problem 
of  establishing  a  pointwise  correspondence. 

The  goal  is  to  test  whether  a  viewed  pattern  P  is  a  linear  combination  of  patterns 
in  the  model,  without  establishing  a  pointwise  correspondence.  To  do  this  we  use  the 
following  idea.  Suppose  that,  as  before,  an  intermediate  view  P  is  the  linear  combination 
of  two  views  Pi  and  P2  in  the  model,  that  is,  P  =  aPx  4-  bP2.  Let  us  take  now  an 
arbitrary  group  of  l  corresponding  points  in  Px,  P2  and  P.  Let  aj,...,a/  denote  the  / 
points  in  pattern  Pi,  in  P2  and  cx,...,ci  in  P.  Let  us  denote  by  Ax  =  Ej_i  a,x 

(i.e.,  the  sum  of  the  x-coordinates  of  all  the  points  in  ax,  ...ai).  Similarly  Ay  =  E!_i  aiyi 
Bx  —  E,'=i  fa,  By  =  ELi  fa?  Cx  =  Ei=i  c*'x  and  Cy  =  E!=i  c<y  From  the  linear 
combination,  P  =  aPx  +  bP2,  it  also  follows  that: 

Cx  =  aAx  +  bBx 

Cy  —  Oi  Ay  +  bBy 

(VVe  have  seen  above  examples  in  which  different  coefficients  were  used  for  the  x-  and 
^-coordinates.  Here  we  have  assumed  for  simplicity  that  they  are  identical).  This  demon¬ 
strates  that  we  can  use  corresponding  subsets  of  points  without  resolving  the  individual 
pointwise  correspondence. 
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Figure  6:  (a)  Applying  cube  and  pyramid  matrices  to  the  cubes  of  fig.  2.  (b)  Applying  pyramid 
and  cube  matrices  to  the  pyramids  of  fig.  2.  Left  column  of  pictures:  the  input  images.  Middle 
column:  the  result  of  applying  the  appropriate  matrix  to  the  images,  these  results  are  identical 
to  the  first  model  pictures  (which  serve  as  canonical  views).  Right  column:  the  result  of  applying 
the  wrong  matrix  to  the  images,  these  results  are  not  similar  to  the  canonical  views. 
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It  is  worth  mentioning  that  if  we  match  a  sufficient  number  of  corresponding  subsets 
of  points,  the  exact  point  to  point  correspondence  can  also  be  resolved,  and  the  two 
methods  are  equivalent.  However,  the  number  of  subsets  may  be  smaller  than  the  number 
of  points,  or  we  can  take  subsets  of  points  that  are  corresponding  in  most  of  the  points, 
but  not  in  all  of  them,  and  still  obtain  good  results  (as  shown  below). 

To  use  the  above  idea  it  becomes  necessary  to  establish  a  correspondence  between 
subsets  of  the  patterns  instead  of  the  individual  points.  There  are  several  possible  ways 
to  approach  this  problem.  Here  we  propose  a  simple  method,  motivated  in  part  by 
considerations  of  biological  plausibility,  that  is  based  on  the  notion  of  linear  receptive 
fields. 

A  linear  receptive  field  (LRF)  is  an  operator  that  takes  a  weighted  contribution  of  the 
points  falling  within  a  given  region,  using  a  linear  weighting  function.  We  will  assume 
here  that  the  LRF  response  is  simply  the  average  contribution  of  the  points  falling  inside 
its  region.  That  is,  given  an  image  P,  the  response  r  is  given  by  ax  +  fty  (for  some 
parameters  a,  ft)  where  the  average  is  taken  over  all  the  points  of  P  falling  within  the 
receptive  field. 

Let  us  examine  the  response  of  an  LRF  of  this  type  to  the  model  and  the  viewed 
object.  Let  Px  and  P2  be  two  pictures  in  the  model  set,  P  is  the  viewed  object,  and 
assume  that  P  =  aPx  +  bP2.  Let  r1}  r2  and  r  be  the  responses  of  the  LRF  to  Px,  P2  and 
P  respectively.  For  each  pattern,  the  LRF  “sees”  only  a  subset  of  the  points  comprising 
the  pattern.  The  other  points  fall  outside  the  receptive  field.  If  the  points  seen  by  the 
LRF  in  Pi,  P 2  and  P  are  corresponding  points  (even  if  the  pointwise  correspondence  is 
unknown),  then  it  is  clear  from  the  considerations  above  that  r  —  ar i  4-  br2.  In  practice, 
some  of  the  points  may  not  have  counterparts  inside  the  LRF,  but  the  relation  will 
hold  approximately  provided  that  the  majority  of  points  remain  within  the  limits  of  the 
receptive  field  in  Px,  P2  and  P.  To  obtain  this  condition  it  is  desirable  to:  (1)  use  large 
receptive  fields,  and  (2)  apply  some  rough  alignment,  as  suggested  in  section  2.2  above, 
prior  to  the  match. 

We  can  now  proceed  along  the  following  line.  Let  r  =  (rl5  r2, ...,  rm)  be  an  ordered 
set  of  LRFs.  We  define  a  model  to  be  the  result  of  applying  this  set  r  to  each  of  the 
model  pictures.  Given  an  image  7,  we  first  perform  a  process  of  rough  alignment  as 
described  earlier,  and  denote  the  result  by  I'.  We  apply  the  set  r  to  and  then  we 
check  whether  the  result  is  a  linear  combination  of  the  model  pictures,  that  is,  we  look 
for  a  set  {ai,ct2j  ••vafc}  of  coefficients  such  that  for  every  1  <  i  <  m  it  holds  that: 


ri(//)  —  5Zairi(Pj) 

j=i 


(4) 


Practically,  since  a  strict  equality  can  rarely  be  achieved,  we  look  for  a  set  {ai,a2,  ...,ajt) 
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of  coefficients  that  minimize  the  difference  between  the  two  terms: 

min  \\r(r)-J2aAPj)\\  (5) 

{«i  ■■■“*} 

This  problem  can  be  approached,  as  with  the  pointwise  correspondence,  by  either 
computing  a  pseudo-inverse,  or  by  performing  the  appropriate  linear  mapping. 

A  preliminary  stage  of  rough  alignment  is  required  in  this  scheme  to  bring  each 
point  in  the  image  to  lie  close  to  a  corresponding  position  in  the  model  (one  of  the 
model  pictures).  Consequently,  each  linear  receptive  field  will  contain  a  relatively  large 
proportion  of  corresponding  points.  As  a  result,  the  application  of  the  set  of  LRFs  to  the 
image  will  yield  approximately  a  linear  combination  of  the  results  of  applying  the  same 
set  of  LRFs  to  the  model  pictures.  The  justification  for  this  approximation  is  given  in 
Appendix  C.  We  show  there  that  as  the  proportion  of  corresponding  points  within  each 
LRF  increases,  the  result  obtained  by  the  application  of  this  set  of  LRFs  to  the  image 
gets  closer  to  a  linear  combination  of  the  results  obtained  by  applying  these  LRFs  to  the 
model  pictures. 

The  use  of  linear  receptive  fields  serves  in  this  scheme  two  distinct  purposes.  The  first 
is  to  establish  correspondence  between  subsets  of  image  points,  rather  than  individual 
points.  The  second  is  a  conversion  between  two  different  types  of  representations.  The 
linear  mapping  method  assumes  that  the  position  of  points  is  given  by  the  numerical 
values  of  their  x-  and  y-coordinates.  The  input  image  is  given,  however,  in  a  different 
representation:  a  2-D  array  of  points.  The  LRF  serves  to  translate  the  position  of  a 
point  within  the  receptive  field  to  a  value  representing  the  coordinate  of  the  point.  Other 
conversion  schemes  are  possible,  but  the  LRF  is  a  simple  one  that  also  appears  to  be 
bilogically  palusible.  It  is  interesting  to  note  that  cells  with  linear  receptive  fields  have 
been  described  in  area  7a  of  macaque  monkeys  [Zipser  Andersen  1988].  In  Zipser  & 
Andersen’s  model  these  cells  also  serve  the  roll  of  converting  position  in  the  plane  to  a 
firing  rate  that  represents  x-  or  y-coordinate. 


3  General  Discussion 

We  have  proposed  above  a  method  for  recognizing  3-D  objects  from  2-D  images.  In 
this  method,  an  object-model  is  represented  by  the  linear  combinations  of  several  2-D 
views  of  the  object.  It  was  shown  that  for  objects  with  sharp  edges  as  well  as  with 
smooth  bounding  contours  the  set  of  possible  images  of  a  given  object  is  embedded 
in  a  linear  space  spanned  by  a  small  number  of  views.  For  objects  with  sharp  edges 
the  linear  combination  representation  is  exact.  For  objects  with  smooth  boundaries 
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it  is  an  approximation  that  often  holds  over  a  wide  range  of  viewing  angles.  Rigid 
transformations  (with  or  without  scaling)  can  be  distinguished  from  more  general  linear 
transformations  of  the  object  by  testing  certain  constraints  placed  upon  the  coefficients 
of  the  linear  combinations. 

We  have  proposed  three  alternative  methods  for  determining  the  transformation  that 
matches  a  model  to  a  given  image.  The  first  method  uses  a  small  set  of  corresponding 
features  identified  in  both  the  model  and  the  image.  Alternatively,  the  coefficients  can  be 
determined  using  a  search.  The  third  method  uses  a  linear  mapping  as  the  main  step  in 
a  scheme  that  maps  the  different  views  of  the  same  object  into  a  common  representation. 

To  avoids  the  need  for  pointwise  correspondence,  we  suggested  the  possible  use  of 
linear  receptive  fields  to  establish  approximate  correspondence  between  subsets  of  points. 

The  development  of  the  scheme  so  far  has  been  primarily  theoretical,  and  initial 
testing  on  a  small  number  of  objects  shows  good  results.  Future  work  should  include 
more  extensive  testing  using  natural  objects,  as  well  as  the  advancement  of  the  theoretical 
issues  discussed  below. 

In  the  concluding  section  we  discuss  three  issues.  First,  we  place  the  current  scheme 
within  the  framework  of  alignment  methods  in  general.  Second,  we  discuss  possible 
extensions.  Finally,  we  list  a  number  of  general  conclusions  that  emerge  from  this  study. 


3.1  Classes  of  alignment  Schemes 

The  schemes  discussed  in  this  paper  fall  into  the  general  class  of  alignment  recognition 
methods.  Other  alignment  schemes  have  been  proposed  by  Bajcsy  &  Solina  [1987],  Chien 
&  Aggarawall  [1987],  Faugeras  &  Hebert  [1986],  Fischler  &  Bolles  [1981],  Grimson  &: 
Lozano-Perez  [19S4],  Lowe  [1985],  Thompson  &  Mundy  [1987].  In  an  alignment  scheme 
we  seek  for  a  transformation  Ta  out  of  a  set  of  allowed  transformations,  and  a  model  M 
from  a  given  set  of  models,  that  minimizes  a  distance  measure  d(M,Ta,  P)  (where  P  is 
the  image  of  the  object).  Ta  is  called  the  alignment  transformation,  it  is  supposed  to 
bring  the  model  M  and  the  viewed  object  P  into  an  optimal  agreement. 

The  distance  measure  d  typically  contains  two  contributions: 


d(M,T0,P)  =  dl(TaM,P)  +  d2(Ta) 


The  first  term  d\{TaM,  P)  measures  the  residual  distance  between  the  picture  P 
and  the  transformed  model  TaM  following  the  alignment,  and  d2(Ta)  penalizes  for  the 
transformation  Ta  that  was  required  to  bring  M  into  a  close  agreement  with  P .  For 
example,  it  may  be  possible  to  bring  M  into  a  close  agreement  with  P  by  stretching  it 
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considerably.  In  this  case  dx{TaM,P)  will  be  small,  but,  if  large  stretches  of  the  object 
are  unlikely,  d2(Ta)  will  be  large.  We  will  see  below  that  different  classes  of  alignment 
schemes  differ  in  the  relative  emphasis  they  place  on  d\  and  d2. 

Alignment  approaches  can  be  subdivided  according  to  the  method  used  for  deter¬ 
mining  the  aligning  transformation  Ta.  The  main  approaches  used  in  the  past  can  be 
summarized  by  the  following  three  categories. 

Minimal  alignment.  In  this  approach  Ta  is  determined  by  a  small  number  of  cor¬ 
responding  features  in  the  model  and  the  image.  Methods  using  this  approach  assume 
that  the  set  of  possible  transformations  is  restricted  (usually  to  rigid  3-D  transformations 
with  possible  scaling,  or  a  Lie  transformation  group,  [Brockett  1989]),  so  that  the  correct 
transformation  can  be  recovered  using  a  small  number  of  constraints. 

This  approach  has  been  used  by  Faugeras  k  Hebert  [1986],  Fischler  k  Bolles  [1981], 
Huttenlocher  k  Ullman  [1987],  Shoham  k  Ullman  [1988],  Thompson  k  Mundy  [1987], 
Ullman  [1986,  1989].  In  these  schemes  the  term  d2  above  is  usually  ignored,  since  there  is 
no  reason  to  penalize  for  a  rigid  3-D  aligning  transformation,  and  the  match  is  therefore 
evaluated  by  dx  only. 

The  correspondence  between  features  may  be  guided  in  these  schemes  by  the  labeling 
of  different  types  of  features,  such  as  cusps,  inflections,  blob-centers,  etc.  [Huttenlocher  k 
Ullman  1987,  Ullman  1989],  by  using  pairwise  constraints  between  features  [Grimson  k 
Lozano- Perez  1984],  or  by  a  more  exhaustive  search  (as  in  [Lamdan,  Schwartz,  k  Wolfson 
1987],  where  possible  transformations  are  pre-computed  and  hashed). 

Minimal  alignment  can  be  used  in  the  context  of  the  linear  combination  scheme 
discussed  in  this  paper.  This  method  was  discussed  in  Section  2.1.  A  small  number  of 
corresponding  features  is  used  to  determine  the  coefficients  of  the  linear  combination. 
The  linear  combination  is  then  computed,  and  the  result  compared  with  the  viewed 
image. 

Full  alignment.  In  this  approach  a  full  correspondence  is  established  between  the 
model  and  the  image.  This  correspondence  defines  a  distortion  transformation  that 
takes  M  into  P .  The  set  of  transformations  is  not  restricted  in  this  approach  to  rigid 
transformations.  Complex  non-rigid  distortions  are  included  as  well.  In  contrast  with 
minimal  alignment,  in  the  distance  measure  d  above,  the  first  term  dx(TQM,  P)  does 
not  play  an  important  role,  since  the  full  correspondence  forces  TaM  and  P  to  be  in 
close  agreement.  The  match  is  therefore  evaluated  by  the  plausibility  of  the  required 
transformation  Ta.  Our  linear  mapping  scheme  in  section  2.3  is  a  full  alignment  scheme. 
A  full  correspondence  is  established  to  produce  a  vector  that  the  linear  mapping  can  then 
act  upon. 

Alignment  search.  In  contrast  with  the  previous  approaches,  this  metod  does  not 
use  feature  correspondence  to  recover  the  transformation.  Instead,  a  search  is  conducted 
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in  the  space  of  possible  transformations.  The  set  of  possible  transformations  {T0}  is 
parametrized  by  a  parameter  vector  a,  and  a  search  is  performed  in  the  parameter  space 
to  determine  the  best  value  of  a.  The  deformable  template  method  [Yuille,  Cohen,  & 
Hallinan,  1989]  is  an  example  for  this  approach.  Section  2.2  described  the  possibility  of 
performing  such  a  search  in  the  linear  combination  appiv.  ~h  to  determine  the  value  of 
the  required  coefficients. 


3.2  Extensions 

The  linear  combination  (LC)  recognition  scheme  is  restricted  in  several  ways.  It  will  be 
of  interest  to  extend  it  in  the  future  in  at  least  three  directions:  relaxing  the  constraints, 
dealing  effectively  with  occlusions,  and  dealing  with  large  libraries  of  objects.  We  limit 
the  discussion  below  to  brief  comments  on  these  three  issues. 

Relaxing  the  constraints 

The  scheme  as  presented  assumes  rigid  transformation  and  an  orthographic  projection. 
Under  these  conditions,  all  the  views  of  a  given  object  are  embedded  in  a  low-dimensional 
linear  subspace  of  a  much  larger  space.  What  happens  if  the  projection  is  perspective 
rather  than  orthographic,  or  if  the  transformations  are  not  entirely  rigid?  The  effect  of 
perspectivity  appears  to  be  quite  limited.  We  have  applied  the  LC  scheme  to  objects 
with  ratio  of  distance-to-camera  to  object-size  down  to  4:1,  with  only  minor  effects  on 
the  results  (less  then  3%  deviation  from  the  orthographic  projection  for  rotations  up  to 
45°). 

As  for  non-rigid  transformations,  an  interesting  general  extension  to  explore  is  where 
the  set  of  views  is  no  longer  a  linear  subspace,  but  still  occupies  a  low  dimensional 
manifold  within  a  much  higher  dimensional  space.  This  manifold  resembels  locally  a 
linear  subspace,  but  it  is  no  longer  “globally  straight”.  By  analogy,  one  can  visualize  the 
simple  linear  combinations  case  in  terms  of  a  3-D  space,  in  which  all  the  orthographic 
views  of  a  rigid  object  are  restricted  to  some  2-D  plane.  In  the  more  general  case  the 
plane  will  bend,  to  become  a  curved  2-D  manifold  within  the  3-D  space. 

This  appears  to  be  a  general  case  of  interest  for  recognition  as  well  as  for  other  learning 
tasks.  For  recognition  to  be  feasible,  the  set  of  views  {  V }  corresponding  to  a  given  object 
cannot  be  arbitrary,  but  must  obey  some  constraints,  e.g.,  in  the  form  F(Vi)  =  0.  Under 
general  conditions,  these  restrictions  will  define  locally  a  manifold  embedded  in  the  larger 
space.  Algorithms  that  can  learn  to  classify  efficiently  sets  that  form  low  dimensional 
manifolds  embedded  in  high  dimensional  spaces  will  therefore  be  of  general  value. 

Occlusion 
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In  the  linear  combination  scheme  we  assumed  that  the  same  set  of  points  is  visible  in 
the  different  views.  What  happens  if  some  of  the  object’s  points  are  occluded  by  either 
self-occlusion  or  by  other  objects? 

As  we  mentioned  in  Section  1.3.5  self-occlusion  is  handled  by  representing  an  ob¬ 
ject  not  by  a  single  model,  but  by  a  number  of  models  covering  its  different  “aspects” 
[Koenderink  &;  Van  Doom  1979]. 

As  for  occlusion  by  other  objects,  this  problem  is  handled  in  a  different  manner 
by  the  minimal  alignment  and  the  full  alignment  versions  of  the  LC  scheme.  In  the 
minimal  alignment  version,  a  small  number  of  corresponding  features  are  used  to  recover 
the  coefficients  of  the  linear  combination.  In  this  scheme,  occlusion  does  not  present  a 
major  special  difficulty.  After  computing  the  linear  combination,  a  good  match  will  be 
obtained  between  the  transformed  model  the  visible  part  of  the  object,  and  recognition 
may  proceed  on  the  basis  of  this  match.  (Alignment  search  will  behave  in  a  similar 
manner.) 

In  the  linear  mapping  version,  an  object’s  view  is  represented  by  a  vector  V;  of  its 
coordinates.  Due  to  occlusion,  some  of  the  coordinates  will  remain  unknown.  A  way  of 
evaluating  the  match  in  this  case  in  an  optimal  manner  is  suggested  in  Appendix  D. 

Multiple  models 

We  have  considered  above  primarily  the  problem  of  matching  a  viewed  object  with  a 
single  model.  If  there  are  many  candidate  models,  a  question  arises  regarding  the  scaling 
of  the  computational  load  with  the  number  of  models. 

In  the  LC  scheme,  the  main  problem  is  in  the  stage  of  performing  the  correspondence, 
since  the  subsequent  testing  of  a  candidate  model  is  relatively  straightforward.  The 
linear  mapping  scheme  is  particularly  attractive  in  this  regard:  once  the  correspondence 
is  known,  the  testing  of  a  model  requires  only  a  multiplication  of  a  matrix  by  a  vector. 

With  respect  to  the  correspondence  stage,  the  question  is  how  to  perform  efficiently 
correspondence  with  multiple  models.  This  problem  remains  open  for  future  study,  we 
just  comment  here  on  a  possible  direction.  The  idea  is  to  use  pre-alignment  to  a  prototype 
in  the  following  manner.  Suppose  that  M\, ...,  A/jt  is  a  family  of  related  models.  A 
single  model  M  will  be  used  for  representing  this  set  for  the  purpose  of  alignment. 
The  correspondence  T,  between  each  A/,  in  the  set  and  M  is  pre-computed.  Given 
an  observed  object  P ,  a  single  correspondence  T  :  M  — *  P  is  computed.  The  individual 
transformations  A/,  — >  P  are  computed  by  the  compositions  T  o  T,. 
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3.3  General  conclusions 


In  this  section  we  summarize  birefly  a  number  of  general  characteristics  of  the  linear 
combinations  scheme.  In  this  scheme,  as  in  some  other  alignemnt  schemes,  significant 
aspects  of  visual  object  recognition  are  more  low-level  in  nature  and  more  pictorial  com¬ 
pared  with  structural  description  recognition  approaches  [e.g.,  Biederman  1985].  The 
scheme  uses  directly  2-D  views  rather  than  an  explicit  3-D  model.  The  use  of  the  2-D 
views  is  different,  however,  from  a  simple  associative  memory  [Abu-Mostafa  &  Psaltis 
1987]  where  new  views  are  simply  compared  in  parallel  to  all  previously  stored  views. 
Rather  than  measuring  the  distance  between  the  observed  object  and  each  of  the  stored 
views,  a  distance  is  measured  from  the  observed  object  to  the  linear  subspace,  (or  a  low 
dimensional  manifold)  defined  by  previous  views. 

The  linear  combination  scheme  “reduces”  the  recognition  problem  in  a  sense  to  the 
problem  of  establishing  a  correspondence  between  the  viewed  object  and  candidate  mod¬ 
els.  The  mehtod  demonstrates  that  if  a  correspondence  can  be  established,  the  remaining 
computation  is  relatively  straightforward.  Establishing  a  reliable  correspondence  between 
images  is  not  an  easy  task,  but  it  is  a  general  task  solved  by  the  visual  system  (e.g.  in 
motion  measurement  and  stereoscopic  vision),  and  related  processes  may  also  be  involved 
in  visual  object  recognition. 
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Appendix  A 

In  section  1.4.2  we  showed  that  the  images  of  an  object  with  smooth  surfaces  rotating 
in  3-D  space  can  be  represented  as  the  linear  combination  of  five  views,  and  mentioned 
that  the  coefficients  for  these  linear  combinations  satisfy  seven  functional  constraints.  In 
this  appendix  we  list  these  constraints. 

We  use  the  same  notation  as  in  section  1.4.2.  Let  , ...,  /?5,  R,  be  3  x  3  rotation  ma¬ 
trices,  and  R[, ...,  R's,  R!  be  the  corresponding  2x5  matrices  defined  in  section  1.4.2.  Let 
ri, ...,  r5,  r  be  the  first  row  vectors,  and  Sj, ...,  s5,  s  the  second  row  vectors  of  R\, ...,  R'h ,  R! , 
respectively.  In  section  1.4.2  we  showed  that  each  of  the  two  row  vectors  of  R'  is  a  linear 
combination  of  the  corresponding  row  vectors  of  R\ ,  R^,...,R'5.  That  is, 

5 

f  =  a«r* 
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S  =  £  6, Si 

t=i 

The  functioned  constraints  can  be  expressed  as: 

r’  +  rf  +  f’-l 

+  52  +  s2  =  1 
TiSi  -f  T2S2  +  T3S3  =  0 
^1  +  ^4  =  52  +  Ss 

^2  +  ^5  =  — (il  +  S4) 
(ri  +  f4)2  -f  (il  +  i4)  =  1 
r4i5  =  i4r5 


(Constraints  1,2,3  and  7  are  immediate.  Constraints  4,5,6  can  be  verified  by  expressing 
all  the  entries  in  terms  of  the  rotation  angles  a,/3, 7.) 

To  express  these  constraints  as  a  function  of  the  coefficients,  every  occurrence  of  a 
term  r,j  should  be  replaced  by  the  appropriate  linear  combination,  as  follows: 

1=1 

5 

Ji  =  12b>(s>)j 

1=1 

In  the  case  of  a  similarity  transformations  (i.e.,  with  scale  change)  the  first  two  con¬ 
straints  are  substituted  by: 


rl  +  r]  +  f  2  =  i2  +  i2  +  si 


Appendix  B 

In  this  appendix  we  describe  a  method  to  find  a  space  of  a  given  dimension,  that  lies  as 
close  as  possible  to  a  given  set  of  points. 

Let  {pi,  p2, ...,  pm}  be  a  set  of  points  in  TZn.  We  would  like  to  find  the  (n  —  k) 
dimensional  space  that  lies  as  close  as  possible  (in  the  least-square  sense)  to  the  points 
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{pi,  p2, pm}-  Let  P  be  the  n  xm  matrix  given  by  (pi,  p2,  ...,pm)-  Let  {t/x, be 
a  set  of  orthonormal  vectors  in  7ln,  and  define  Uk  =  span{uk+i ,  ...,un}.  The  sum  of  the 
distances  (squared)  of  the  points  Pi,p2,».,pm  from  Uk  is  given  by: 

= z  n  p‘u.  r 

1=1 

(Since  IZf=i(Pt«i)2  is  the  squared  distance  of  p,  from  Uk  .) 

Let  F  =  PP*.  Then: 

D\Uk)  =  £  II  PVi  ||2=  j2(Ptu>Y(ptu>)  =  Eu\Fu , 

1=1  1=1  i=l 

Any  real  matrix  of  the  form  XX \  is  symmetric  and  non-negative.  Therefore,  F  has  n 
eigenvectors  and  n  real  non-negative  eigenvalues.  Assume  that  the  {uj,  above  are 

the  eigenvectors  of  F  with  eigenvalues  Ai  <  A2  <  ...  <  An  respectively,  then  Fu,  =  A ,-iq, 
and  therefore: 

z?2(^)  =  Ea- 

i=i 

Claim:  Let  {Ax,...,Afc}  be  the  k  smallest  eigenvalues  of  F,  then: 

£A,  =  min  D2(Vk) 

i=i  v* 

Where  the  minimum  is  taken  over  all  the  linear  subspaces  of  dimension  n  —  k.  Therefore, 
...,  un}  is  the  best  (n  —  k )  dimensional  space  through  pi,  p2, pm- 

Proof:  Let  V*  be  a  linear  subspace  of  dimension  (n  —  k).  We  must  establish  that: 

D2(Vk)  >  D2{Uk) 

Let  {ni,...,vn}  be  a  set  of  orthonormal  vectors  in  TZn  such  that  Vk  =  span{vk+u ...,  un}. 
V  =  (tij , ...,  vn),  and  U  =  (ui,...,un)  are  n  x  n  orthonormal  matrices.  Let: 

R  =  U'V 


Then: 


U  R—V 


That  is: 


n 


=  £ruu.- 

i=l 
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R  is  also  orthonormal,  therefore: 


And  therefore: 


I>;  =  £ri  =  l 
«'= 1  i= 1 


Fvj  =  F(^rtjn,)  =  A.u, 


=  (2rbui)(2r0A*u«) 


Since  ujuj  =  <5,j  we  obtain  that: 


i=i  i=i 


vjFvj  =  2r5A« 


Therefore: 


it  k  tl  n  k 

D2(Vk)  =  2  wjF Uj  =  2  X]  roA<  =  2(2  ri)A« 

j=i  j=i  i=i  i=i  j=i 


<•.  =  £ 


Then: 


D2(V,)  =  2>,A, 


Where  0  <  a,  <  1  and  £”=1  a,  =  k. 

The  claim  we  wish  to  establish  is  that  the  minimum  is  obtained  when  a,  =  1  for 
i  =  1  ...k,  and  a,  =  0  for  i  =  k  +  l...n.  Assume  that  for  V*,.  there  exists  1  <  m  <  k  such 
that  am  <  1,  and  k  +  1  <  /  <  n  such  that  a/  >  0.  We  can  decrease  o;  and  increase  am 
(by  min(o:;,  1  —  am)),  and  this  cannot  increase  the  value  of  D2(Vk).  By  repeating  this 
process  we  will  eventually  reach  the  value  of  D2{Uk).  Since  during  this  process  the  value 
cannot  increase,  we  obtain  that: 


And  therefore: 


D\Uk)  <  D2(Vk) 


2  A,  =  min  D2(Vk) 

.=1  v* 
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Appendix  C 


In  this  appendix  we  establish  that  in  the  method  using  linear  receptive  fields  the  ap¬ 
proximation  improves  with  the  proportion  of  corresponding  points  within  each  recep¬ 
tive  field,  and  derive  a  bound  on  the  error.  We  are  given  a  set  of  points  in  the  im¬ 
age  p  =  ( pi,...,Pn )  that  fall  within  a  given  receptive  field,  and  k  sets  of  model  points 
Pi  =  (Pn,  -vPinj)*  — )  Pfc  =  (Pfci, ... that  fall  within  the  same  receptive  field.  Let  p 
be  the  average  of  pj, ...,  pft,  and  p;  the  average  of  ptl, ...,  pint  for  every  1  <  i  <  k.  We  next 
show  that  the  difference  between  p  and  the  linear  combination  of  px,...,Pfc  is  bounded 
by  a  term  which  is  proportional  to  the  relative  number  of  corresponding  points  falling 
within  the  receptive  field. 

Claim:  For  some  given  constants  ai,...,afc,  let  l  be  the  largest  index  such  that  for 
every  1  <  j  <  l  it  holds  that  Y^=\aiPij  =  Pi-  Denote  n  =  max{nx, ...,  n^,  n},  d  = 
m&xiJ,k{\Pij  -  P.fcl ,  I Pj  -  Pfc|}  and  q  =  1  -  then: 


p  -  E  a«p* 


1=1 


<  qd(  1  +  E  lai  l 


i=i 


(where  d  is  the  diameter  of  the  receptive  field). 

Proof:  Let  us  first  extend  the  sets  of  points  in  such  a  manner  that  each  will  have 

the  same  number  of  points,  n.  We  will  do  so  by  setting  p,_,  =  p^  for  every  1  <  i  <  k, 
ni  <  j  <  n,  and  let  pj  =  p  for  every  h  <  j  <  n.  We  now  have  a  new  set  of  vectors 
Pi, ...,  pfc,  p  each  of  length  n,  all  having  the  same  averages  they  had  originally.  Therefore: 


k 

P  XI  aipi 
1=1 


S(Pj  -  X>Pu) 

J=1  «'=1 

=  i  E 

nJ=l+ 1 


1  n 

<-E 

1  j= 1 


P:  -  E  a'P>J 


i=i 


Pi  -  E  a-Pu 


>=i 


Now,  let  dij  —  pi-j  —  p,i  and  d:  =  pj  —  px,  we  obtain: 


p  -  Ea'P* 

,=i 


i  n 

<  -  E 

n  j=i+  i 


Pj  XI  a'P'j 


i=  1 


= 1  E 


i=l+ 1 


Pi  +  (h  ~  E  a»(P>i  +  ^.j) 


1=1 


= 1  E 


j=i+i 


dj  'y  I  ^i dij 


1=1 


< 
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<i  E(W  +  I>ll*>l)s<Ki+X>< 

71  i=l+i  «= l  t=i 


Therefore,  the  difference  |p  —  £*=1  o,P,  |  is  bounded  by  a  term  which  is  proportional  to 
?• 

^From  this  claim  we  can  conclude  the  following:  Let  pls  ...,pk  be  the  values  obtained 
by  applying  a  linear  receptive  field  to  the  pictures  of  a  given  model,  and  let  p  be  the  value 
obtained  by  applying  the  same  LRF  to  a  given  image.  If  the  image  can  be  presented 
as  a  linear  combination  of  the  model  pictures,  then  the  error  p  —  a,p,  is  bounded 
by  a  term  which  is  proportional  to  q.  Therefore  we  can  in  principle  reduce  this  term  by 
reducing  q,  that  is,  by  constructing  the  LRF  such  that  it  will  cover  more  corresponding 
points  of  each  picture. 


Appendix  D 

In  the  linear  mapping  method  a  matrix  L  was  constructed  that  maps  every  legal  view  v 
of  the  object  to  a  constant  output  vector.  If  the  common  output  is  chosen  to  be  the  zero 
vector,  then  Lv  =  0  for  any  legal  view  of  the  object. 

In  this  appendix  we  consider  briefly  the  case  where  the  object  is  only  partially  visible. 
We  model  this  situation  by  assuming  that  we  are  given  a  partial  vector  p.  In  this 
vector  the  first  k  coordinates  are  unknown,  due  to  the  occlusion,  and  only  the  last  n  —  k 
coordinates  are  observable.  (A  partial  correspondence  between  the  occluded  object  and 
the  model  is  assumed  to  be  known.) 

In  the  vector  p  we  take  the  first  k  coordinates  to  be  zero.  We  try  to  construct  from 
p  a  new  vector  p'  by  supplementing  the  missing  coordinates  so  as  to  minimize  ||  Lp'  ||. 
The  relation  between  p  and  p'  is: 

k 

p'  =  p  +  a>u< 

i=i 

where  the  a;  are  unknown  constants,  and  the  u,  are  unit  vectors  along  the  first  k  coor¬ 
dinates. 

In  matrix  notation,  we  seek  to  complement  the  occluded  view  by  minimizing: 

min  II  Lp  +  LUa  || 

a  1 

Where  the  columns  of  the  matrix  U  are  the  vectors  u,  and  a  is  the  vector  on  the  unknown 
a,'s. 
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The  solution  to  this  minimization  problem  is: 

a  =  ~[LU)+Lp 

(where  H+  denotes  the  pseudo- inverse  of  the  matrix  H).  This  means  that  the  pseudo¬ 
inverse  ( LU)+  will  have  to  be  computed.  The  matrix  L  is  fixed,  but  U  depends  on  the 
points  that  are  actually  visible. 

This  optimal  value  of  a  can  also  be  used  to  determine  the  output  vector  of  the 
recognition  process  Lp': 

Lp'  =  (/  -  [LU][LU)+)Lp 

p  is  then  recognized  as  a  legal  view  if  this  output  is  sufficiently  close  to  zero. 
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